The Allotrope Data Format (ADF) [[!ADF]] consists of several APIs as well as ontologies. It defines an interface and file format for storing scientific observations from analytical chemistry. This document constitutes the specification the Allotrope Data Format Data Cube (ADF-DC) API for storing and reading analytical data. It defines how to store one- or multi-dimensional data.
THESE MATERIALS ARE PROVIDED "AS IS" AND ALLOTROPE EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE WARRANTIES OF NON-INFRINGEMENT, TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This document is part of a set of specifications on the Allotrope Data Format [[!ADF]]
The Allotrope Data Format (ADF) defines an interface and file format for storing scientific observations from analytical chemistry. It is intended for long-term stability of archived analytical data and fast real-time access to it. The ADF Data Cube API (ADF-DC) defines an interface for storing n-dimensional analytical result data in form of data cubes. ADF-DC uses the vocabulary of the W3C Data Cube Ontology [[!QB]] to describe the basic structure and meta data of data cubes and observations. The ADF Data Cube Ontology [[!ADF-DCO]] extends the [[!QB]] by advanced concepts, such as specific selections of subsets on data cubes, scale types, complex data types and order functions.
ADF is based on the Hierarchical Data Format [[!HDF5]] which is specifically designed to store large amounts of numerical data. The ADF Data Cube to HDF5 Mapping Ontology [[!ADF-DCO-HDF]] provides classes and properties to define the mapping between the abstract data cubes defined in terms of the data cube ontology to their concrete HDF5 representations in the ADF file. That is, HDF-DCO-HDF defines the mapping between functional and physical representations. The physical representation in HDF5 is described by an HDF5 ontology, which is based on the the official HDF5 specifications [[!HDF5]].
This document is structured as follows: First, the role of the ADF Data Cube API within the ADF high-level structure is shown. Second, the general requirement for an ADF Data Cube API are listed. Third, a use case data cube on mass spectroscopy is described which will be referred to in later examples to illustrate specified methods. Finally, the different API methods for creation, writing and reading are specified in detail with corresponding parameters. For each of the specified methods example RDF representations of corresponding meta data are provided.
The IRI of an entity has two parts: the namespace and the local identifier.
Within one RDF document the namespace might be associated by a shorter prefix.
For instance the namespace IRI http://www.w3.org/2002/07/owl#
is commonly associated with the prefix owl:
and one can write owl:Class
instead of the full IRI http://www.w3.org/2002/07/owl#Class
.
Within this specification, the following namespace prefix bindings are used:
Prefix | Namespace |
---|---|
owl: | http://www.w3.org/2002/07/owl# |
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs: | http://www.w3.org/2000/01/rdf-schema# |
xsd: | http://www.w3.org/2001/XMLSchema# |
dct: | http://purl.org/dc/terms/ |
skos: | http://www.w3.org/2004/02/skos/core# |
qb: | http://purl.org/linked-data/cube# |
af-x: | http://purl.allotrope.org/ontologies/property# |
adf-dp: | http://purl.allotrope.org/ontologies/datapackage# |
adf-dc: | http://purl.allotrope.org/ontologies/datacube# |
adf-dc-hdf: | http://purl.allotrope.org/ontologies/datacube-to-hdf5-map# |
ex: | http://example.com/ns# |
Within this document the definitions of MUST, SHOULD and MAY are used as defined in [[!rfc2119]].
This document MAY use the Unified Modeling Language [[UML]] to illustrate some concepts and visualize RDF graphs. These diagrams are non-normative and SHOULD not be interpreted in the strict interpretation specified by the UML specification.
Within this document, decimal numbers will use a dot "." as the decimal mark.
The next figure illustrates the ADF Data Cube API within the high-level structure of the Allotrope Data Format (ADF) [[!ADF]] API stack:
This document specifies the methods which MUST be provided by the ADF Data Cube API.
The following key requirements MUST be addressed by the ADF Data Cube API:
The following figure describes a typical Liquid Chromatography Mass Spectroscopy (LC/MS) for a single sample:
This use case can be represented by a data cube with the following structure. Two dimension components:This section describes the core operations that MUST be provided by the ADF-DC API: creating a data cube as well as writing data into and reading (subsets of) data from it. These methods are described in the following subsections.
According to the RDF Data Cube Vocabulary [[!QB]], a data cube qb:DataSet
MUST specify a structure which is expressed through a data structure definition qb:DataStructureDefinition
(DSD).
The DSD defines the components of a data cube through component specification qb:ComponentSpecification
s which describe either dimension components adf-dc:Dimension
or measure components adf-dc:Measure
.
Details on classes and properties of the ADF Data Cube Ontology are specified in [[!ADF-DCO]].
The ADF-DC API MUST provide a method to create a data cube by specifying all its components, i.e. by explicitly specifying the data structure though dimensions and measures. The API MUST provide a method to create a data cube by reusing an existing data structure definition (DSD) which is represented in form of RDF triples. The detailed specifications of these methods are given in the following subsections.
In general, the API methods for creation of a data cube MUST provide a parameter to specify a IRI for the data cube. Specification of a label or title for a data cube SHOULD be possible as well.
The API MUST provide a method to describe the structure of the data cube by explicitly defining one or more measures and, optionally, one or more dimensions.
A measure adf-dc:Measure
is a component specification for dependent data values, which represent the measured values.
The required parameters and required meta data descriptions that MUST be persisted in ADF-TS are described in the following subsections.
The API method for definition of a measure of a data cube MUST provide the following parameters:
owl:ObjectProperty
or an owl:DatatypeProperty
owl:DatatypeProperty
, the IRI of the component data type MUST be one of the standard XSD data types (xsd:integer
, xsd:decimal
, xsd:string
etc.).owl:ObjectProperty
, the IRI of the component data type MUST be either rdfs:Resource
or a complex data type represented by a IRI that represents a data shape [[!SHACL-ED]].
If a data shape is specified, its structure SHOULD be accessible to the API, e.g., by explicit representation within ADF-TS.
adf-dc:OrderFunction
.
Any order function MUST be defined according to the component data type. For instance, a native order adf-dc:nativeOrder
is only applicable to components with a primitive component data type that represents real numbers.
adf-dc:NominalScale
, adf-dc:OrdinalScale
, adf-dc:CardinalScale
, adf-dc:IntervalScale
or adf-dc:RatioScale
.
The API method for definition of a measure of a data cube MUST persist the description of the measure in ADF-TS according to the following structure:
ex:intensityMeasure
a adf-dc:RatioScale , # MAY
adf-dc:Measure ,
qb:ComponentSpecification ;
adf-dc:componentDataType xsd:int ;
qb:measure «af-x:intensity» . # the measure property
A dimension adf-dc:Dimension
is a component specification for independent data values.
The required parameters and required meta data descriptions that MUST be persisted in ADF-TS are described in the following subsections.
The API method MUST provide the following parameters:
owl:ObjectProperty
or owl:DatatypeProperty
.
xsd:integer
, xsd:decimal
, xsd:string
etc. or a complex data type represented by a data shape.
The specified data shape MUST be accessible to the API e.g. by explicit representation within the ADF-TS.
adf-dc:OrderFunction
.
Any order function MUST be defined according to the component data type.
adf-dc:NominalScale
, adf-dc:OrdinalScale
, adf-dc:CardinalScale
, adf-dc:IntervalScale
or adf-dc:RatioScale
.
qb:order
specified through an integer which defines the order of the dimension components.
The order value must be distinct for different dimension components of one data structure definition.
The API method MUST persist the description of the dimension in ADF-TS as follows:
ex:sampleIndexDimension
a adf-dc:RatioScale , # MAY
adf-dc:Dimension ,
qb:ComponentSpecification ;
adf-dc:componentDataType afs-qudt:ArbitraryUnitValue ;
adf-dc:orderedBy adf-dc:nativeOrder ; # SHOULD
qb:dimension «af-x:index» ;
qb:order "1"^^xsd:long . # SHOULD
ex:UseCaseDataSet
a qb:DataSet ;
rdfs:label "Data Cube for LC/MS use case" ; # SHOULD
dct:title "Data Cube for LC/MS use case" ; # SHOULD
qb:structure ex:UseCaseDSD .
ex:UseCaseDSD
a qb:DataStructureDefinition ;
qb:component ex:sampleIndexDimension ,
ex:massPerChargeDimension ,
ex:retensionTimeDimension ,
ex:intensityMeasure .
The structure of a data cube is expressed by a data structure definition (DSD)
The API method for definition of a data cube by reference to a DSD MUST provide the following parameters:
qb:DataStructureDefinition
which defines the dimensions and measures of the data cube.
There MUST be a way to reuse an explicitly created DSD, since many data cubes share a common structure.
There MAY be a way to reuse an implicitly created DSD. A DSD is created implicitly for every created DataCube that does not reuse an existing DSD.
Thus, the API MUST provide a method to specify the structure of a data cube at creation time by reference to a IRI of a predefined DSD.
The detailed requirements of the method are described next.
The method for creation of a reusable DSD is described afterwards.
Parameters
The API method MAY provide the following parameters:
The API method MUST persist the description of the data cube as described above in ADF-TS, however in this case the description of the DSD is already available and does not have to be defined again.
ex:UseCaseDataSet
a qb:DataSet ;
rdfs:label "Data Cube for LC/MS use case" ; # MAY
dct:title "Data Cube for LC/MS use case" ; # MAY
qb:structure ex:UseCaseDSD . # MUST reference to the reused DSD
The API MUST provide a method to create a reusable data structure definition qb:DataStructureDefinition
that can be referenced at creation of a data cube.
In general, the parameters for creation of measures and dimensions listed above for explicit creation of a data cube MUST be provided also for creation of a DSD.
Further requirements for the creation of a DSD are described next.
The API method for definition of a data structure definition (DSD) MUST provide parameters for creation of measures and dimensions as listed above. Furthermore, the following parameters MUST be provided:
The API method for definition of a DSD MUST persist the description of the DSD in ADF-TS as follows:
ex:UseCaseDSD # MUST: IRI of the DSD
a qb:DataStructureDefinition ; # MUST
rdfs:label "LC/MS use case DSD" ; # SHOULD: a label for the DSD
qb:component ex:sampleIndexDimension ,
ex:massPerChargeDimension ,
ex:retensionTimeDimension ,
ex:intensityMeasure .
According to the ADF-DCO, a data selection is an n-dimensional subset of data of a data cube.
A data selection is specified in form of a selection structure definition adf-dc:SelectionStructureDefinition
which is defined as "a set of component selections on the components of a data structure definition".
The selection is based on dimension adf-dc:Dimension
and measure adf-dc:Measure
components.
For each dimension component exactly one dimension selection MUST be defined.
For measure components, at least one measure selection MUST be defined.
The ADF-DC API MUST provide a method to create data selections. The API MUST provide methods to create data selections based on business values (functional selection). The API MAY provide methods to create data selections based on index values (physical selection). In general, the scale type of a component determines which type of selections are possible. For example, on nominal scales, only point selections are possible. Furthermore range selections MUST be only allowed, when components have an associated order function.
The API MUST provide methods for specifying a dimension selection adf-dc:DimensionSelection
, i.e. a selection on a dimension component.
The API method for selections on dimension scales MUST provide the following parameters:
adf-dc:NominalScale
, the API MUST provide a parameter for specifying the selection of values from a scale dimension by a value range with minimum and maximum value.
qudt:QuantityValue
the API MUST support a range selection for values with different units but the same quantity kind as defined in [[!QUDT]].
The API MUST provide methods for specifying a measure selection adf-dc:MeasureSelection
,
i.e. a selection on a measure component.
The scale type of the measure component determines which selections are possible.
The API method for selections on measures MUST provide the same parameters as specified for a dimension selection. Additionally, it MUST provide the following parameters:
The API SHOULD provide a convenience method to select the complete content of the data cube.
The ADF-DC API MUST provide a method to write data into a data cube. There MUST be methods for writing data into a data cube by simple n-dimensional array structures. Writing SHOULD be done via data selections.
For all methods realizing the writing data into a data cube the API MUST provide exactly one parameter for the IRI of the corresponding target data cube. Other specific parameters are described below.
Writing into a data cube SHOULD be based on the following principle: A source data selection is written to a target data selection. The values of the source data selection are read in the order of the dimensions and written in the same way into the target data selection. The following figure illustrates this principle: On a 3x5 data cube, a 3x2 data selection (marked in green) is created. This source data selection is written to a 2x3 data selection (marked in read) on the target 3x3 data cube.
The API method for writing data into an data cube MUST provide the following parameters:
The ADF-DC API MUST provide a method to read the data from of a data cube. In particular, the API MUST provide a method to read data in form of n-dimensional arrays. In general the API MAY realize the reading of a data cube by creation of a (copy of) the data cube that is to be read. Thus, the principle for reading is the same as for writing, only vice versa: a data selection on a source data cube is written into a selection of a target data cube. Because of this, the API method for writing MAY be reused and the API MAY do without a separate method for reading.
The ADF-DC API MUST provide a method to read the complete data from of a data cube.
The API method for reading from a data cube MUST provide the following parameters:
The API MUST provide a method for reading data from a data cube by specifying a data selection.
The API method for reading from a data cube MUST provide the following parameters:
Complex data types are represented in RDF as data shapes using Shape Constraint Language [[!SHACL-ED]]. They are referenced, e.g., by measure and dimension component specifications. As described above, the API MUST provide a method parameter to specify complex data types for dimension and measure components. The API MAY also provide a method to create new complex data types that can be referenced.
Persisting data on HDF5 files poses some additional requirements.
Regarding dimension components there are some HDF5 specifics that MAY be provided by the API. These are listed in the following subsections.
Chunking enables to manage storage space far more efficiently.
A chunk size of 1000 means that dimension values are stored in chunks of 1000 * (size of the data type used), e.g. 1000 * 8 byte per double value = 8 KB chunks. Only chunks that actually hold data require storage space. Without chunking, the storage space requested for the dimension would have been completely allocated immediately upon creation.
The API MAY provide a parameter of the dimension creation method to specify the scale of a dimension by scale mappings in order to allow more efficient storage of dimension values. If the API provides a corresponding parameter, it SHOULD be possible to specify the following scale mappings:
adf-dco-hdf:IdentityScaleMapping
that specified that the dimension index is equal to the dimension values.
adf-dco-hdf:ExplicitScaleMapping
that specifies dimension mapping which defines the explicit mapping of dimension index to dimension values.
adf-dco-hdf:FunctionScaleMapping
that specifies an index function, that defines the mappig of a dimension index to dimension values by some mathematical function.
For example, linear function or different logarithms are provided by [[!ADF-DCO-HDF]].
Version | Release Date | Remarks |
---|---|---|
0.3.0 | 2015-04-30 |
|
0.4.0 | 2015-06-18 |
|
1.0.0 | 2015-09-29 |
|
1.1.0 RC | 2016-03-11 |
|
1.1.0 RF | 2016-03-31 |
|
1.1.5 | 2016-05-13 |
|
1.2.0 Preview | 2016-09-23 |
|
1.2.0 RC | 2016-12-07 |
|
1.3.0 Preview | 2017-03-31 |
|
1.3.0 RF | 2017-06-30 |
|
1.4.3 RC | 2018-10-11 |
|
1.4.5 RF | 2018-12-17 |
|
1.5.0 RC | 2019-12-12 |
|
1.5.0 RF | 2020-03-24 |
|
1.5.3 RF | 2020-11-30 |
|